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Abstract — We develop an approach to machine learning and 
anomaly detection via quantum adiabatic evolution. In the train- 
ing phase we identify an optimal set of weak classifiers, to form 
a single strong classifier. In the testing phase we adiabatically 
evolve one or more strong classifiers on a superposition of inputs 
in order to find certain anomalous elements in the classification 
space. Both the training and testing phases are executed via quan- 
tum adiabatic evolution. We apply and illustrate this approach 
in detail to the problem of software verification and validation. 



I. Introduction 

MACHINE learning is a field of computational research 
with broad applications, ranging from image processing 
to analysis of complex systems such as the stock market. There 
is abundant literature concerning learning theory in the clas- 
sical domain, addressing speed and accuracy of the learning 
process for different classes of concepts (TJ. Groundwork for 
machine learning using quantum computers has also been laid, 
showing that quantum machine learning, while requiring as 
much input information as classical machine learning, may be 
faster and is capable of handling concepts beyond the reach 
of any classical learner |2), j3]. 

We consider the machine learning problem of binary clas- 
sification, assigning a data vector to one of two groups based 
on criteria derived from a set of training examples provided 
to the algorithm beforehand. The learning method we use is 
boosting, whereby multiple weak classifiers are combined to 
create a strong classifier formula that is more accurate than 
any of its components alone Q, (5). This method can be 
applied to any problem where the separation of two groups 
of data is required, whether it is distinguishing two species of 
plants based on their measurements or picking out the letter 
"a" from all other letters of the alphabet when it is scanned. 
Our approach to classification is based on recent efforts in 
boosting using adiabatic quantum optimization (AQO) which 
showed advantages over classical boosting in the sparsity 
of the classifiers achieved and their accuracy (for certain 
problems) (6), Q. 

As a natural outgrowth of the classification problem, we 
also formulate a scheme for anomaly detection using quantum 
computation. Anomaly detection has myriad uses, some exam- 
ples of which are detection of insider trading, finding faults in 
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mechanical systems, and highlighting changes in time-lapsed 
satellite imagery |8|. Specifically, we pursue the verification 
and validation (V&V) of classical software, with programming 
errors as the anomalies to be detected. This is one of the 
more challenging potential applications of quantum anomaly 
detection, because programs are large, complex, and highly 
irregular in their structure. However, it is also an important 
and currently intractable problem for which even small gains 
are likely to yield benefits for the software development and 
testing community. 

The complexity of the V&V problem is easily understood 
by considering the number of operations necessary for an 
exhaustive test of a piece of software. Covering every possible 
set of inputs that could be given to the software requires a 
number of tests that is exponential in the number of input 
variables, notwithstanding the complexity of each individual 
test [9|. Although exhaustive testing is infeasible due to its 
difficulty, the cost of this infeasibility is large - in 2002, NIST 
estimated that tens of billions of dollars were lost due to 
inadequate testing [10|. 

The subject of how to best implement software testing 
given limited resources has been widely studied. Within this 
field, efforts focused on combinatorial testing have found 
considerable success and will be relevant to our new approach. 
Combinatorial testing focuses on using the test attempts avail- 
able to test all combinations of up to a small number, t, of 
variables, with the idea that errors are usually caused by the 
interaction of only a few parameters fT TJ, p2) . This approach 
has found considerable success |13|, |14|, with scaling that 
is logarithmic in n, the number of software parameters, and 
exponential in t. 

Currently, the use of formal methods in the coding and 
verification phases of software development is the only way 
to guarantee absolute correctness of software without imple- 
menting exhaustive testing. However, formal methods, are 
also expensive and time-consuming to implement. Model 
checking, a method of software analysis which aims to ensure 
the validity of all reachable program states, solves n-bit 
satisfiability problems (which are NP-complete), with n as a 
function of the number of reachable states of the program 1 15 1. 
Theorem proving, where a program is developed alongside 
a proof of its own correctness, requires repeated interaction 
and correction from the developer as the proof is formed, 
with the intermediate machine-provable lemmas checked with 
a satisfiability solver p6) . 

We propose a new approach to verification and validation 
of software which makes use of quantum information pro- 
cessing. The approach consists of a quantum learning step 
and a quantum testing step. In the learning step, our strategy 
uses quantum optimization to learn the characteristics of the 
program being tested and the specification it is being tested to. 
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This learning technique is known as quantum boosting and has 
been previously applied to other problems, in particular image 
recognition (6), (T7|-|19|. Boosting consists of building up a 
formula to accurately sort inputs into one of two groups by 
combining simple rules that sort less accurately, and in its 
classical forms has been frequently addressed in the machine 
learning literature (4), (3), p0| . 

The testing step is novel, and involves turning the classi- 
fying formulas generated by the learning step into a function 
that generates a lower energy the more likely its input is to 
represent a software error. This function is translated into the 
problem Hamiltonian of an adiabatic quantum computation 
(AQC). The AQC allows all potential software errors (indeed, 
as we will see, all possible operations of the software) to 
be examined in quantum-parallel, returning only the best 
candidates for errors which correspond to the lowest values 
of the classification function. 

Both the learning and testing steps make use of AQC. An 
adiabatic quantum algorithm encodes the desired result in the 
ground state of some problem Hamiltonian. The computation 
is then performed by initializing a physical system in the 
easily prepared ground state of a simpler Hamiltonian, then 
slowly changing the control parameters of the system so the 
system undergoes an adiabatic evolution to the ground state 
of the difficult-to-solve problem Hamiltonian [21], [22]. The 
adiabatic model of quantum computation is known to be 
universal and equivalent to the circuit model with a polynomial 
conversion overhead (23), J24) . While it is not known at this 
time how to make AQC fault tolerant, several error correction 
and prevention protocols have been proposed for AQC |25|, 
[26 1, and it is known to exhibit a certain degree of natural 
robustness (57), (28) . 

In this article, Section [II] will begin by establishing the 
framework through which the quantum V&V problem is 
attacked, and by defining the programming errors we seek 
to eliminate. As we proceed with the development of a 



method for V&V using quantum resources, Section III will 



establish an implementation of the learning step as an adiabatic 
quantum algorithm. We develop conditions for ideal boosting 
and an alternate quantum learning algorithm in Section [TV] 
The testing step will be detailed in Section [V] We present 
simulated results of the learning step on a sample problem in 



Section VI and finish with our conclusions and suggestions 
for future work in Section IVIII 



II. Formalization 

In this section we formalize the problem of software error 
detection by first introducing the relevant vector spaces and 
then giving a criterion for the occurrence of an error. 



A. Input and output spaces 

Consider an "ideal" software program P, where by ideal 
we mean the correct program which a perfect programmer 
would have written. Instead we are faced with the real life 
implementation of P, which we denote by P and refer to as 
the "implemented program." Suppose we wish to verify the 



operation of P relative to P. All programs have input and 
output spaces V n and V out , such that 



P : V in M. V 



(1) 



Without loss of generality we can think of these spaces as 
being spaces of binary strings. This is so because the input to 
any program is always specified within some finite machine 
precision, and the output is again given within finite machine 
precision (not necessarily the same as the input precision). 
Further, since we are only interested in inputs and outputs 
which take a finite time to generate (or "write down"), without 
loss of generality we can set upper limits on the lengths of 
allowed input and output strings. Within these constraints we 
can move to a binary representation for both input and output 
spaces, and take N- m as the maximum number of bits required 
to specify any input, and N out as the maximum number of 
bits required to specify any output. Thus we can identify the 
input and output spaces as binary string spaces 



{0,1} 



Win 



V 



{0,1} 



(2) 



It will be convenient to concatenate elements of the input 
and output spaces into single objects. Thus, consider binary 
vectors x = (x*i n ,x ou t), where x ou t — P(x- m ), consisting of 
program input-output pairs: 



g {o,ir in x {0,1}^ = {o,i} 



V. (3) 



B. Recognizing software errors 

1) Validity domain and range: We shall assume without 
loss of generality that the input spaces of the ideal and im- 
plemented programs are identical. This can always be ensured 
by extending the ideal program so that it is well defined for 
all elements of V;„. Thus, while in general not all elements 
of Vin have to be allowed inputs into P (for example, an 
input vector that is out of range for the ideal program), one 
can always reserve some fixed value for such inputs (e.g., the 
largest vector in V ou t) an d trivially mark them as errors. The 
ideal program P is thus a map from the input space to the 
space TZ ou t of correct outputs: 



P : V in >-»■ K mt C V 



(4) 



More specifically, P computes an output string x ou t for every 
input string x- m , i.e., we can write x out = P(xi n ). Of course 
this map can be many-to-one (non-injective and surjective), but 
not one-to-many (multi-valued^The implemented program P 
should ideally compute the exact same function. In reality it 
may not. With this in mind, the simplest way to identify a 
software error is to find an input vector x- ln such that 



\\P{x ia ) - P{x in )\\ ^0. 



(5) 



in some appropriate norm. This is clearly a sufficient condition 
for an error, since the implemented program must agree with 
the ideal program on all inputs. However, for our purposes a 
more general approach will prove to be more suitable. 

'Random number generation may appear to be a counterexample, as it is 
multi-valued, but only over different calls to the random-number generator. 
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2) Specification and implementation sets: A direct way to 
think about the existence of errors in a software program is 
to consider two ordered sets within the space of input-output 
pairs, V. These are the set of ordered, correct input-output 
pairs S according to the program specification P, and the set 
of input-output pairs S implemented by the real program P. 
We call S the "specification set" and S the "implementation 
set". The program under test is correct when 



S = S. 



(6) 



That is, in a correct program, the specification set of correct 
input-output pairs is exactly the set that is implemented in 
code. 

As stated, (|6) is impractical since it requires knowledge of 
the complete structure of the intended input and output spaces. 
Instead, we can also use the specification and implementation 
sets to give a correctness criterion for a given input-output 
pair: 

Definition 1. A vector x S V is erroneous and implemented 
if 

x i S & x G S. (7) 



Input-output vectors satisfying |7]l are the manifestation of 
software errors ("bugs") and their identification is the main 
problem we are concerned with here. Conversely, we have 

Definition 2. A vector x £ V is correct and implemented if 

x e s & x e s. (8) 

Input-output vectors satisfying |8]l belong to the "don't- 
worry" class. The two other possibilities belong to the "don't- 
care" class: 

Definition 3. A vector x £ V is correct and unimplemented if 
x e S k x £ S. (9) 



Definition 4. A vector x € V is erroneous and unimplemented 
if 

x$s & x i s. (io) 



A representation of the locations of vectors satisfying the 
four definitions for a sample vector space can be found in Fig. 
Q] Our focus will be on the erroneous vectors of Definition Q] 

Note that Eq. |5]l implies that the vector is erroneous and 
implemented, i.e., Definition [I] Indeed, let x ou t — P(x\ n ), 
i.e., x = {x in ,x out ) £ S, but assume that x out ^ x out where 
iout = P(xin)- Then x ^ S, since X[ n pairs up with x out in S. 
Conversely, Definition [T] implies Eq. |5|. To see this, assume 
that x = (xin,x ou t) € S but x = {x in ,x out ) S. This must 
mean that x out ^ x out , again because x- m pairs up with x out in 
S. Thus Eq. Q is in fact equivalent to Definition [I] but does 
not capture the other three possibilities captured by Definitions 

EH 




Fig. 1: Schematic vector space representation showing regions 
of vectors satisfying the four definitions. Region 1, of er- 
roneous but implemented vectors, is the location of errors. 
Regions 2, 3, and 4 represent vectors which are correct and 
implemented, correct and unimplemented, and erroneous and 
unimplemented, respectively. 



Definitions T]|4 will play a central role in our approach to 
quantum V&V. 

3) Generalizations: Note that it may well be advantageous 
in practice to consider a more general setup, where instead of 
studying only the map from the input to the output space, we 
introduce intermediate maps which track intermediate program 
states. This can significantly improve our error classification 
accuracy]^] Formally, this would mean that Eq. is replaced 
by 



P : Vin ^ T\ • • • Xj TZ, 



OUt 7 (11) 

where \Xj\l =1 are intermediate spaces. However, we shall not 
consider this more refined approach in this work. 

As a final general comment, we reiterate that a solution of 
the problem we have defined has implications beyond V&V. 
Namely, Definitions T]|4 capture a broad class of anomaly 
(or outlier) detection problems |8|. From this perspective 
the approach we detail in what follows can be described as 
"quantum anomaly detection," and could be pursued in any 
application which requires the batch processing of a large data 
space to find a few anomalous elements. 

III. Training a quantum software error classifier 

In this section we discuss how to identify whether a given 
set of input-output pairs is erroneous or correct, and imple- 
mented or unimplemented, as per Definitions [T]|4] To this end 
we shall require so-called weak classifiers, a strong classifier, a 
methodology to efficiently train the strong classifier, and a way 
to efficiently apply the trained strong classifier on all possible 
input-output pairs. Both the training step and the application 
step will potentially benefit from a quantum speedup. 

2 One important consideration is that, as we shall see below, for practical 
reasons we may only be able to track errors at the level of one-bit errors and 
correlations between bit-pairs. Such limited tracking can be alleviated to some 
extent by using intermediate spaces, where higher order correlations between 
bits appearing at the level of the output space may not yet have had time to 
develop. 
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A. Weak classifiers 

Consider a class of functions which map from the input- 
output space to the reals: 

hi 



(12) 



We call these functions "weak classifiers" or "feature detec- 
tors," where i G {1,...,N} enumerates the features. These 
are some predetermined useful aggregate characteristics of the 
program P which we can measure, such as total memory, or 
CPU time average [29]. Note that N will turn out to be the 
number of qubits we shall require in our quantum approach. 

We can now formally associate a weak classification with 
each vector in the input-output space. 

Definition 5. Weak classification of x € V. 

Weakly classified correct (WCC): a vector x is WCC if 

hi(x) > 0. 

Weakly classified erroneous (WCE): a vector x is WCE if 
hi(x) < 0. 

Clearly, there is an advantage to finding "smart" weak 
classifiers, so as to minimize N. This can be done by invoking 
heuristics, or via a systematic approach such as one we present 
below. 

For each input-output pair x we have a vector h(x) = 



(hi(x),...,h N (x)) £ 



Such vectors can be used to 



construct geometric representations of the learning problem, 
e.g., a convex hull encompassing the weak classifier vectors 
of clustered correct input-output pairs. Such a computational 
geometry approach was pursued in (29). 

We assume that we can construct a "training set" 



T = {x s ,y s }l 



(13) 



where each x s € V is an input-output pair and y s = y(x s ) 
■ I iff x s is correct (whether implemented or not, i.e., x s 6 
S) while y s = — 1 iff x s is erroneous (again, implemented 
or not, i.e., x s ^ S). Thus, the training set represents the 
ideal program P, i.e., we assume that the training set can be 
completely trusted. Note that Eq. |4]) presents us with an easy 
method for including erroneous input pairs, by deliberately 
misrepresenting the action of P on some given input, e.g., 
by setting x out TZ ou t(P). This is similar to the idea of 
performing V&V by building invariants into a program [ 30 1 . 

We are free to normalize each weak classifier so that 
hi E [—1/N,1/N] (the reason for this will become clear 
below). Given Definition [5] we choose the sign of each weak 
classifier so that hi(x s ) < for all erroneous training data, 
while hi(x s ) > for all correct training data. Each point 
h(x s ) € [—l/N,l/N] N (a hypercube) has associated with 
it a label y s which indicates whether the point is correct or 
erroneous. The convex hull approach to V&V [29] assumes 
that correct training points h(x s ) cluster. Such an assumption 
is not required in our approach. 

B. Strong classifier 

We would like to combine all the weak classifiers into a 
single "strong classifier" which, given an input-output pair, 
will determine that pair's correctness or erroneousness. The 



problem is that we do not know in advance how to rank 
the weak classifiers by relative importance. We can formally 
solve this problem by associating a weight lOj € K with each 
weak classifier hi. The problem then becomes how to find the 
optimal set of weights, given the training set. 

The process of creating a high-performance strong classifier 
from many less accurate weak classifiers is known as boosting 
in the machine learning literature. Boosting is a known method 
for enhancing to arbitrary levels the performance of known 
sets of classifiers that exhibit weak learnability for a problem, 
i.e., they are accurate on more than half of the training 
set [20 1, 1 3 1 1 . The most efficient method to combine weak 
classifiers into a strong classifier of a given accuracy is an open 
question, and there are many competing algorithms available 
for this purpose [32) , (33) . Issues commonly considered in 
the development of such algorithms include identification of 
the data features that are relevant to the classification problem 
at hand p4[ , (35) and whether or not provisions need to be 
taken to avoid overfitting to the training set (causing poor 
performance on the general problem space) [36} , [37]. We 
use an approach inspired by recent quantum boosting results 
on image recognition (6), |[T7|-p9|. This approach has been 
shown to outperform classical boosting algorithms in terms 
of accuracy (but not speed) on selected problems, and has 
the advantage of being implementable on existing quantum 



optimization hardware |38|-[41|. 

Since we shall map the Wi to qubits we use binary weights 
Wi € {0, 1}. It should be straightforward to generalize our 
approach to a higher resolution version of real-valued it;, using 
multiple qubits per weight. 

Let w = (wi, wn) <E {0, 1}^, and let 



JY 



Rti(x) = w ■ h(x) = ^2wihi(x) e [-1,1]. (14) 

i=l 

This range is a direct result of the normalization hi € 
[—1/N,1/N] introduced above. 
We now define the weight-dependent "strong classifier" 



'w{x) = sign [Rus(x)] 



(15) 



and use it as follows: 



Definition 6. Strong classification of x £ V. 

Strongly classified correct (SCC): a vector x is SCC if 

Qw(x) = +1. 

Strongly classified erroneous (SCE): a vector x is SCE if 
Qw(x) = -1. 

There is a fundamental difference between the "opinions" 
of the strong classifier, as expressed in Definition [6] and the 
actual erroneousness/correctness of a given input-output pair. 
The strong classifier associates an erroneous/correct label with 
a given input-output pair according to a weighted average 
of the weak classifiers. This opinion may or may not be 
correct. For the training set we actually know whether a given 
input-output pair is erroneous or correct. This presents us 
with an opportunity to compare the strong classifier to the 
training data. Namely, if y s Qw{x s ) = — 1 then Q^(x s ) and 
y s have opposite sign, i.e., disagree, which means that Q$(x s ) 
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mistakenly classified x s as a correct input-output pair while 
in fact it was erroneous, or vice versa. On the other hand, if 
VsQw{x s ) = +1 then Q^(x s ) and y s agree, which means that 
Qrt(x s ) is correct. Formally, 



y s Qvj{x s ) = +1 

ysQw(x s ^ l 



(x s is SCC) = true or 
(x s is SCE) = true 

(x s is SCC) = false or 
(x s is SCE) = false 



(16a) 



(16b) 



The higher the number of true instances is relative to the 
number of false instances, the better the strong classifier 
performance over the training set. The challenge is, of course, 
to construct a strong classifier that performs well also beyond 
the training set. To do so we must first solve the problem of 
finding the optimal set of binary weights w. 

C. The formal weight optimization problem 

Let H [z] denote the Heaviside step function, i.e., H [z] = 
if z < and H [z] = 1 if z > 0. Thus H [-y s Qaj(x s )} = 1 
if the classification of x s is wrong, but H [— y s Qin(x s )] = 
if the classification of x s is correct. In this manner 
H [ — ysQw(x s )] assigns a penalty of one unit for each in- 
correctly classified input-output pair. 

Consider 

s 



This counts the total number of incorrect classifications. There- 
fore minimization of L(w) for a given training set {x s ,y s }^ =1 
will yield the optimal set of weights w° pt — {w° pt }iLi- 

However, it is important not to overtrain the classifier. Over- 
training means that the strong classifier has poor generalization 
performance, i.e., it does not classify accurately outside of the 
training set (37), | |42[ . To prevent overtraining we can add a 
penalty proportional to the Hamming weight of w, i.e., to the 
number of non-zero weights ||u?||o = SiLi w i- l n tn i s manner 
an optimal balance is sought between the accuracy of the 
strong classifier and the number of weak classifiers comprising 
the strong classifier. The formal weight optimization problem 
is then to solve 



Itfopt _ 



argmin [L[w) + X\\w\\q] 



(18) 



where A > can be tuned to decide the relative importance 
of the penalty. 



D. Relaxed weight optimization problem 
Unfortunately, the formulation of 



18 1 is unsuitable for 



adiabatic quantum computation because of its discrete nature. 
In particular, the evaluation of the Heaviside function is 
not amenable to a straightforward implementation in AQC. 
Therefore, following |6j, we now relax it by introducing a 
quadratic error measure, which will be implementable in AQC. 

Let y = (y 1: ...,y s ) G {-1,1} S and = 
(Rti(xi), ...,R^(xs)) € [—1, l] s ■ The vector y is the ordered 



label set of correct/erroneous input-output pairs. The compo- 
nents Rw(x) of the vector R$ already appeared in the strong 
classifier ( p"5] ). There we were interested only in their signs 
and in Eq. \l6\ we observed that if y s R^{x s ) < then x s 
was incorrectly classified, while if y s R^(x s ) > then x s was 
correctly classified. 

We can consider a relaxation of the formal optimization 
problem (18i by replacing the counting of incorrect clas- 



sifications by a sum of the values of y s Rw(x s ) over the 
training set. This makes sense since we have normalized 
the weak classifiers so that R^(x) € [—1,1], while each 
label y s G { — 1, 1}, so that all the terms y s R^(x s ) are in 
principle equally important. In other words, the inner product 
y ■ Rrf = X)f=i VsRw{x s ) is also a measure of the success 
of the classification, and maximizing it (making y and R$ as 
parallel as possible) should result in a good training set. 

Equivalently, we can consider the distance between the 
vectors y and R^ and minimize it by finding the optimal 
weight vector u; opt , in general different from that in Eq. ( 18 I. 
Namely, consider the Euclidean distance 



5(w) = \\y-Rt,\\ 2 = J2 



N 



Wihi{x s ) 



N 



N 



— 1 i— 1 



(19) 



(I 7 ) where h l = (fti(n), h t {x s )) G [-1/N, 1/N] S and where 



Cij — hi ■ hj 



s 



hi(x s )hj{x s ), 



s=l 

s 



Civ = hi ■y = ^2h l (x s ) 



(20) 



(21) 



can be thought of as correlation functions. Note that they are 
symmetric: C'^ = and C[ y = C' yl . The term ||y|| 2 = S is 
a constant offset so can be dropped from the minimization. 

If we wish to introduce a sparsity penalty as above, we 
can do so again, and thus ask for the optimal weight in the 
following sense: 



w° pt = argmin [S(w) + A'H| ] 

w 

= arg min 



JV N 
i,j=l i—1 



(22) 



where A' = 2A. 



E. From QUBO to the Ising Hamiltonian 



Equation ( 22 1 is a quadratic binary optimization (QUBO) 
problem (17). One more step is needed before we can map it 
to qubits, since we need to work with optimization variables 
whose range is { — 1, 1}, not {0, 1}. Define new variables qi = 
2(wi — 1/2) <= { — 1, 1}. In terms of these new variables the 
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minimization problem is 



q° pt = argmin 



N 



1 - 

7 E C' %] ( qi + l)( qj + l) 



i=i 



arg mm 



N N 
— l i—X 



(23) 



where in the second line we dropped the constant terms 
I E£ =1 ^d £* X (A - Cy, used the symmetry of 
for £\ =1 % £\. =1 (X = j=i C'ii*' and where we defined 



4 'J' 



r' 



1 ^ 

sE^r 



(24) 



Thus, the final AQC Hamiltonian for the quantum weight- 
learning problem is 



Hp = 



N 

E 



iV 

CijZiZj + S ' (A — Ciy)Zi 
=i 



(25) 



Z) a =i hi{xs)y s -\ J2 s =i hi{x s ) J2j=i h j( x s)- Thus, in order 
to generate Hp one must first calculate the training data using 
the chosen set of weak classifiers. 



In this final form [Eq. (25 1], involving only one and two- 



qubit Zi terms, the problem is now suitable for implementation 
on devices such as D-Wave's adiabatic quantum optimization 



processor [19], |39 



In Section |IV-D we shall formulate an alternative weight 
optimization problem, based on a methodology we develop 



in Section IV for pairing weak classifiers to guarantee the 
correctness of the strong classifier. 

F. Adiabatic quantum computation 

The adiabatic quantum algorithm implements the time- 
dependent interpolation 



H(t) = s(t)Hj + [1 - s(t)]H F , 



(26) 



where Hj is a Hamiltonian which does not commute with Hp 
and should have a ground state (lowest-energy eigenvector) 
that is easily reachable, such as 



Hi 



N 

E 

i=l 



X., 



(27) 



where I is the identity operator and Xi is the Pauli a x acting 
on the ith qubit pi) , |22) . The interpolation function s(t) 
satisfies the boundary conditions s(0) = 1, s(T) — 0, where 
T is the final time. Provided the evolution is sufficiently slow 



where Zi is the Pauli spin-matrix a z acting on the ith 
qubit. This represents Ising spin-spin interactions with cou- 
pling matrix C\j, and an inhomogeneous magnetic field 
A — Ci V acting on each spin. Note how Hp encodes the then 
training data {hi(x s ), y s }i, s via the coupling matrix Cy = 
\ X)f=i hi(x s )hj(x s ) and the local magnetic field Ci y = 



(in a manner we shall quantify momentarily), the adiabatic 
theorem guarantees that the final state \ip(T)} reached by the 
algorithm is, with high probability, the one that minimizes the 
energy of Hp |43 1-|45 1. This means that, for Hp chosen as in 
Eq. ( |2"5) , it finds as a ground state the optimal weights vector 
<7° pt as defined in (23 1. These weights can then be "read 
off by measuring the final states of each of the N qubits: 

\i>(T)) = \ q °P\...,qT) = \? pt ). 

It should be noted that while the number of weak classifiers 
that can be selected from using this algorithm may appear to 
be limited by the number of qubits available for processing, 
this is not in fact the case. By performing multiple rounds of 
optimization, each time filling in the spaces left by classifiers 
that were assigned weight in the previous round, an opti- 
mized group of N weak classifiers can be assembled. If the 
performance of the strong classifier is unsatisfactory with N 
weak classifiers, multiple groups of N found in this manner 
may be used together. 

The scaling of the computation time tp with the number 
of qubits (or weak classifiers, in our case), N, is determined 
by the inverse of the minimal ground state energy gap of 
H(t). There are many variants of the adiabatic theorem, 
differing mostly in assumptions about boundary conditions and 
differentiability of H(t). Most variants state that, provided 



tp> 



\H\\ 



\m F )m F ))\>i-^. 



(28) 
(29) 



The left-hand side of Eq. ( |29| > is the fidelity of the actual 
state \ip(tp)) obtained under quantum evolution subject to 
H(t) with respect to the desired final ground state \<p(tp)}. 
More precisely, \ip(t)) is the solution of the time-dependent 
Schrodinger equation d\ip(t))/dt = —iH(t)\ip(t)) (in h = 1 
units), and |0(t)) is the instantaneous ground state of H(t), 
i.e., the solution of H(t)\4>(t)) = J B o (t)|0(t)), where E (t) is 
the instantaneous ground state energy [the smallest eigenvalue 
of H(t)]. The parameter e, < e < 1, measures the quality of 
the overlap between \ip(tp)) and \(j>(tp)), H is the derivative 
with respect to the dimensionless time t/tp, A is the minimum 
energy gap between the ground state \<fr(t)) and the first excited 
state of H(t) (i.e., the difference between the two smallest 
equal-time eigenvalues of H(t), for t € [0, tp)), The values of 
the integers a and f3 depend on the assumptions made about 
the boundary conditions and differentiability of H(t) ||43|- 
[45 1; typically a G {0, 1, 2}, while f3 can be tuned between 1 
and arbitrarily large values, depending on boundary conditions 
determining the smoothness of H(t) (see, e.g., Theorem 1 in 
Ref. [45 1). The crucial point is that the gap A depends on 
N, typically shrinking as N grows, while the numerator ||JJ|| 
typically has a mild TV-dependence (bounded in most cases 
by a function growing as N 2 (45]). Consequently a problem 
has an efficient, polynomial time solution under AQC if A 
scales l/poly(iV). However, note that an inverse exponential 
gap dependence on N can still result in a speedup, as is the 
case, e.g., in the adiabatic implementation of Graver's search 
problem [46], [47 1, where the speedup relative to classical 
computation is quadratic. 
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As for the problem we are concerned with here, finding the 
ground state of Hp as prescribed in Eq. ( |25j > in order to find 
the optimal weight set for the (relaxed version of the) problem 
of training a software error-classifier, it is not known whether 
it is amenable to a quantum speedup. A study of the gap 
dependence of our Hamiltonian H(t) on N, which is beyond 
the scope of the present work, will help to determine whether 
such a speedup is to be expected also in the problem at hand. A 
related image processing problem has been shown numerically 
to require fewer weak classifiers than in comparable classical 
algorithms, which gives the strong classifier a lower Vapnik- 
Chernovenkis dimension and therefore a lower generalization 
error [7], [18|. Quantum boosting applied to a different task, 
30-dimensional clustering, demonstrated increasingly better 
accuracy as the overlap between the two clusters grew than 
that exhibited by the classical AdaBoost algorithm [6|. More 
generally, numerical simulations of quantum adiabatic imple- 
mentations of related hard optimization problems (such as 
Exact Cover) have shown promising scaling results for N 
values of up to 128 (22), g(j), (48). We shall thus proceed 
here with the requisite cautious optimism. 

IV. Achievable strong classifier accuracy 

We shall show in this section that it is theoretically possible 
to construct a perfect, 100% accurate majority-vote strong 
classifier from a set of weak classifiers that are more than 
50% accurate - if those weak classifiers relate to each other 
in exactly the right way. Our construction in this section is 
analytical and exact; we shall specify a set of conditions weak 
classifiers should satisfy for perfect accuracy of the strong 
classifier they comprise. We shall also show how to construct 
an imperfect strong classifier, with bounded error probability, 
by a relaxation of the conditions we shall impose on the weak 
classifiers. We expect the quantum algorithm to find a close 
approximation to this result. 

Consider a strong classifier with a general binary weight 
vector w £ {0, 1} N , as defined in Eq. (JT4J. Our approach will 
be to show that the strong classifier in Eq. ( p"4| ) is completely 
accurate if a set of three conditions is met. The conditions 
work by using pairs of weak classifiers which both classify 
some x correctly and which disagree for all other x. An 
accurate strong classifier can be constructed by covering the 
entire space V with the correctly classifying portions of such 
weak classifier pairs. 

To start, every vector x G V has a correct classification, as 
determined by the specification set: 



x e S <^=> y(x) = +1, 
x $l S <^=> y(x) = — 1 

A strong classifier is perfect if 

Qw{x) = y(x) Vf e V. 



(30a) 
(30b) 

(31) 



The weak classifiers either agree or disagree with this correct 
classification. We define the correctness value of a weak 
classifier for a given input x: 



Ci(x) = hi{x)y{x) 



-1 hi(x) = y(x) 
-1 hi(x)^y(x) 



Thus, similarly to the strong classifier case [Eq. ( fTS) ] we have, 
formally, 



Ci(x) = +1 



(x) 



-1 



(x is WCC) = true or 
(x is WCE) = true 

{x is WCC) = false or 
(x is WCE) = false 



(33a) 



(33b) 



where WCC and WCE stand for weakly classified correct and 
weakly classified erroneous, respectively (Definition |5). 

A given input-output vector x receives either a true or false 
vote from each weak classifier comprising the strong classifier. 
Let us denote the index set of the weak classifiers comprising 
a given strong classifier by I, If the majority of the votes given 
by the weak classifiers in I are true then the vector receives 
a strong classification that is true. Let us loosely denote by 
w E I the set of weak classifiers whose indices all belong to 
X. Thus 



^ Ci(x) > => Qw(x) = y{x) if w € X. 



(34) 



It follows from Eq. pT) that if we can find a set of weak 
classifiers for which Yliex c « (^) > f° r a ^ input-output 
vectors x, then the corresponding strong classifier is perfect. 
This is what we shall set out to do in the next subsection. 

A. Conditions for complete classification accuracy 

First, we limit our working set to those weak classifiers 
with greater than 50% accuracy. This is a prerequisite for 
the feasibility of the other conditions. To ensure that at least 
half the initial dictionary of weak classifiers is more than 
50% accurate, we include each potential weak classifier in 
the dictionary, as well as its opposite. The opposite classifier 
follows the same rule as its counterpart, but makes the opposite 
binary decision every time, making each right where the other 
is wrong and ensuring that at least one of them will have 50% 
or greater accuracy. Condition 1, therefore, defines the set A, 



ACV = {1,...,N}, 



(35) 



of sufficiently accurate weak classifiers, where T> is the set 
of all possible values of the index i of weak classifiers in 
Eq. ([g. 

Condition 1. For an input-output vector x £ V selected 
uniformly at random 



A = {i: P[ci(S) = 1] > 1/2}. 



(36) 



(32) 



P[uj] denotes the probability of event uj. We use a prob- 
abilistic formulation for our conditions since we imagine the 
input-output space V to be very large and accessed by random 
sampling. 

Conditions [2] and [3] (or [3a]i specify the index set 

J C A x A, (37) 

labeling pairs of weak classifiers which will make up the final 
strong classifier. Condition [2] groups the weak classifiers into 
pairs which classify the minimal number of vectors x correctly 
at the same time and give opposite classifications on all other 
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vectors. Condition [3] completes the specification of the index 
set J: it states that the subsets of vectors x that are classified 
correctly by the classifier pairs in J must cover the entire 
space V. 



Condition 2. If(j,j')<Ej then 

P[( Cj (x) = l)n( Cj ,(x) = l)] 
= P[ Cj (x) = l]+P[c r (x) = l}-l 



(38) 



for an input-output vector x € V selected uniformly at random. 

This condition has the following simple interpretation, il- 
lustrated in Fig. [2] Suppose the entire input-output space V is 
sorted lexicographically (e.g., according to the binary values 
of the vectors x € V) so that the jth weak classifier is correct 
on all first Nj vectors but erroneous on the rest, while the j'th 
weak classifier is correct on all last Nf vectors but erroneous 
on the rest. Thus the fraction of correctly classified vectors by 
the jth classifier is (1 — r}f) = Nj/\V\, the fraction of correctly 
classified vectors by the j'th classifier is (1 — rjji) = Nj/ /|V|, 
and they overlap on a fraction of 1 — r)j — rjji vectors (all 
vectors minus each classifier's fraction of incorrectly classified 
vectors), as illustrated in the top part of Fig. [2] By "pushing 
classifier f to the left", as illustrated in the bottom part of 
Fig. [2] the overlap grows and is no longer minimal. This is 
what is expressed by Eq. ( |3~8"| >. 

Condition [2] considers only one pair of weak classifiers at 
a time, which does not suffice to cover all of V. Consider 
a set of weak classifier pairs each satisfying Condition [2] 
which, together, do cover all of V. Such a set would satisfy 

Ey,i')G^ p [( c i(^) = !) n ( c i'<X> = !)] = 1 for a randomly 
chosen x 6 V. This is illustrated in Fig. [3] However, it is 
also possible for two or more pairs to overlap, a situation we 
would like to avoid as much as possible, i.e., we shall impose 
minimal overlap similarly to Condition [2] Thus we arrive at: 

Condition 3. 

P[(c j (^ = i)n(c r (x) = i)]- 

£ P[(c j (x) = l)n(c j ,(x) = l) 09) 
U,j'Wk.k')eJ 

n( Cfc (f) = i)n(c k ,(x) = 1)] = 1, 

where the overlap between two pairs of weak classifiers 
with labels (j,f) and (k, k') is given by the subtracted terms. 
Condition [3] is illustrated in Fig. |4] 

It is possible to substitute a similar Condition [3a] for the 
above Condition [3] to create a different, yet also sufficient set 
of conditions for a completely accurate strong classifier. The 
number of weak classifiers required to satisfy the alternate 
set of conditions is expected to be smaller than the number 
required to satisfy the original three conditions. This is due 
to the fact that the modified conditions make use of one 
standalone weak classifier to cover a larger portion of the space 
correctly than is possible with a pair of weak classifiers. 



pair satisfying 
Condition 2 
pair violating 
Condition 2 
V 



Fig. 2: Illustration of Condition [2] Two pairs of classifiers 
showing regions of correct (green) and incorrect (red) classifi- 
cation along a line representing a lexicographical ordering of 
all vectors within V. The top pair, compliant with Condition 2, 
provides two correct classifications for the minimum possible 
number of vectors, voting once correctly and once incorrectly 
on all other vectors. The bottom pair, violating Condition 2, 
provides two correct votes for more vectors than does the 
top pair, but also undesirably provides two incorrect votes for 
some vectors; this is why paired weak classifiers must coincide 
in their classifications on as few vectors as possible. 



Fig. 3: Illustration of Condition [3] without the subtracted term. 
Five pairs of 60% accurate weak classifiers combine to form 
a completely accurate majority-vote strong classifier. Moving 
from top to bottom through the pairs and from left to right 
along the vectors in the classification space, each pair of weak 
classifiers provides two correct votes for 20% of the vector 
space and neutral votes otherwise. This means that the majority 
vote is correct for the entire space because no two pairs vote 
correctly at once. 



Fig. 4: Illustration of Condition [3] with the subtracted term. 
Three pairs of 70% accurate weak classifiers combined to form 
a completely accurate majority-vote strong classifier. In this 
case, each pair votes twice correctly on 40% of the vector 
space, which makes it necessary for the correct portions of 
the second and third pairs from the top to overlap. Because 
they only overlap by the minimum amount necessary, V as a 
whole is still covered by a correct majority vote. 
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3a 



Two pairs and one sin- 



Fig. 5: Illustration of Condition 
gle weak classifier form a completely accurate majority-vote 
strong classifier. The two pairs cover 40% of the vector space 
with correct votes, and the single weak classifier (the first 
element of the fourth pair in Fig. [3] the faded-out classifiers 
in the third, fourth, and fifth pairs are omitted from this strong 
classifier) provides an extra correct vote to tip the balance in 
the remaining 60% to a correct overall classification. 



Condition 3a. 



E 

(Jd')¥=(.k,k')€j 



P [( Cj (x) = 1) n (c r (x) = 1)]+P [c a (x) = 1] - 

P[(c j (x) = i)n(c j ,(x) = i) 



n(c fe (f) = i)n(c fc /(£) = i)] 



P [(c a (x) = 1) n (c 3 (x) = 1) n Ma!) - 1)] - 1 

(40) 

This condition is illustrated in Fig. [5] Its interpretation is 
similar to that of Condition [3] except that the standalone 
classifier with the subscript a is added to the other classifier 
pairs, and its overlap with them is subtracted separately in the 
last line. 

The perfect strong classifier can now be constructed from 
the weak classifiers in the set J defined by the conditions 
above. Define Jl as the set of all j from pairs £ J . 

Similarly, define Jr as the set of all f from pairs (j,f) E 3 ■ 
Note that, since any pair for which j = j' would not have 
minimum correctness overlap and therefore could not be in 
J, it follow that j ^ j 1 for all pairs (j, /), i.e., J L n J R = 0- 
The strong classifier is then ( fT4| ) with each Wi being one of 
the elements of a pair, i.e., 



Wi 



l i e (Jl u J r ) 

otherwise 



(41) 



B. Perfect strong classifier theorem 

We will now prove that any strong classifier satisfying 
Conditions T][3 or T]|3a is completely accurate. 

Lemma 1. Assume Condition^and G J. Then the sum 

of the correctness values of the corresponding weak classifiers 
is nonnegative everywhere with probability 1, namely 

P[ Cj {x) +c r (x) >0] = I (42) 



for an input-output vector x £ V selected uniformly at random. 

Proof: For any pair [j,]') £ J we have 

P[( Cj -(f) = l)U( Ci /(£) = l)] 
= P[c j (x) = l]+P[c j ,(x) = l] 

-P[(c j (x) = i)n(c j ,(x) = i)} 

= 1 



(43) 



by Condition [2] Eq. ( |43j ) means that at least one of the two 
weak classifiers evaluates to 1. Since by definition Cj(aT) £ 
{ — 1, 1} Vi, the sum is 2 or with probability 1, i.e., 



P[c j (x) + c r (x)e{0,2}] = l. 



(44) 



Recall that if the majority of the votes given by the weak 
classifiers comprising a given strong classifier is true then 
the input-output vector being voted on receives a strong 
classification that is true [Eq. ([34j»], and that if this is the 
case for all input-output vectors then the strong classifier is 
perfect [Eq. pTj)]. We are now in a position to state that this 
is the case with certainty provided the weak classifiers belong 
to the set J defined by the conditions given above. 

Theorem 1. A strong classifier comprised solely of a set of 
weak classifiers satisfying Conditions 7]|5 is perfect. 



Proof: It suffices to show that the correctness sum is at 
least 2 with probability 1 when Conditions T]|3 are met, namely 
that 



P 



E (c 3 (x) + c r (j:))>2 



= 1. 



(45) 



Now, 



P 



u 



{c j (x) + c j ,(x) = 2) 



r U = 1 ) n Mf) = 1) (46a) 

> Pl(cj(x) = i)n( Cj ,(x) = i)} 

£ P[( Cj (f) = i)nMa) = i) 

(3,3')^(k,k')ej 

n(c fc (x) = l)n(cv(z) = l)] (46b) 
= 1 by Cond. [3] (46c) 

where equality ( |46c| > holds for the inequality ( |46b| p1 because 
the probability of an event cannot be greater than 1. 

Thus, for any randomly selected vector x € V, the correct- 
ness sum of at least one of the pairs is 2, i.e., 



P[3(j,j')ej:(c j (g)+c j ,(x) = 2)] = l. 



(47) 



3 This inequality reflects the fact that for n overlapping sets, P [U"=i s i\ = 

TZ=iP[»i\ - E&jPl>i n s j] + Ei^ V fc p h n Sj n Sk] ~ 

'^2 il tj-£l,-£ rn P[si n sj n Sfc n s m ] + . . . Each term is larger than the next 
in the series; n + 1 sets cannot intersect where n sets do not. Our truncation 
of the series is greater than or equal to the full value because we stop after a 
subtracted term. 
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Lemma[T]tells us that the correctness sum of each pair of weak 
classifiers is positive, while Eq. (j47j states that for at least one 
pair this sum is not just positive but equal to 2. Therefore the 
correctness sum of all weak classifiers in J is at least 2, which 
is Eq. |47]l. ■ 

Theorem 2. A strong classifier comprised solely of a set of 
weak classifiers satisfying Conditions [7J [2] and [3a] is perfect. 

Proof: It suffices to show that the correctness sum is at 
least 1 with probability 1 when Conditions [T] [2] and 3a 
met, namely that 



P 



51 M f ) + c i'(^)) + c «( f ) ^ 1 
We proceed similarly to the proof of Theorem [T] 



= 1. 



(48) 



P 



(J ( Cj (x) + c y (x) = 2) U (c [x) = 1) 



J2 P [(<*(*) = 1) n (c r (x) = 1)]+P [c a {x) = 1] 

53 P[( Cj (x) = i)n( Cf (x) = i) 

n(c k (x) = i)n(<v(£) = i)] 

X] P [(ca(£) = i) n ( Cj (f) = l) n ( Ci ,(f) = i)] 



1 by Cond. 3a 



(49) 

Thus the correctness sum of at least one of the pairs together 
with the singled-out weak classifier is greater than or equal to 
1, i.e., 

P [3(j,f) G J : (cj(x) + Cj ,{x) = 2) U (c a (x) = 1)] = 1. 

(50) 

This result, together with Lemma [T] implies the correctness 
sum of all weak classifiers in J is at least 1, which is Eq. (|48]l. 



C. Imperfect strong classifier theorem 

Because the three conditions on the set J of weak classifiers 
guarantee a completely accurate strong classifier, errors in the 
strong classifier must mean that the conditions are violated 
in some way. For instance, Condition [2] could be replaced 
by a weaker condition which allows for more than minimum 
overlap of vectors x categorized correctly by both weak 
classifiers in a pair. 



Condition 2a. If (j,f) G J then 

P[(c j (x)=l)n(c j ,(x) = l)} 
= P [ Cj (x) = 1}+P [c r (x) = 1] - 1 + e jf 
for an input-output vector iGV selected uniformly at random. 



(51) 



(53) 



The quantity ejji is a measure of the "overlap error". We can 
use it to prove relaxed versions of Lemma [T] and Theorem [T] 

Lemma la. Assume Condition [7] and G J. Then 

the sum of the correctness values of the corresponding weak 
classifiers is nonnegative everywhere with probability 1 — e,y, 
namely 

P [cj (x) + cy (x) > 0] = 1 - ejf (52) 

for an input-output vector x G V selected uniformly at random. 
Proof: The proof closely mimics that of Lemma [T] 

P[{ Cj (x) = l)U{ Cj ,{x) = l)} 
= P[ Cj {x) = l]+P[c r (x) = l] 

-P[( Cj (x) = l)n(c f (x) = l)} 
= P[ Cj (x) = l} + P[c f (x) = l] 

- P [cj (x) =1}-P [c r {x) = 1} + 1 - tjj. 
= 1 - e 3f 

by Condition [2a] As in the proof of Lemma [T] this implies 

P [cj (x) + cj. (x) G {0, 2}] = 1 - ejj, . (54) 



We can now replace Theorem [T] by a lower bound on 
the success probability when Condition [2] is replaced by the 
weaker Condition 2a Let us first define an imperfect strong 
classifier as follows: 

Definition 7. A strong classifier is e-perfect if for x G V 
chosen uniformly at random, it correctly classifies x [i.e., 
Qw{x) = y(x)] with probability at least 1 — e. 

Theorem 3. A strong classifier comprised solely of a set of 
weak classifiers satisfying Conditions^ J2a]flnt/[5]is e-perfect, 
where e = J2(j,j>) e j e oi '■ 

Proof: It suffices to show that the correctness sum is 
positive with probability 1 minus the sum of the overlap errors 
when Conditions [T| [2a] and [3] are satisfied, namely 



P 



53 cj{x)+cj,{x)>0 



> 1 - 



53 e n'- 



(55) 



Now, by definition cj(x) + Cj>(x) G {—2,0,2}, and the 
correctness sum of at least one of the pairs must be negative 
in order for the correctness sum over all weak classifiers in J 
to be negative, so that 



P 



53 Cj(£) + cj'(x) < (56a) 
<J,f)£J 

< P [3(j,f) G J : cj(x) + c r {x) = -2] . (56b) 



However, we also need to exclude the case of all weak 
classifier pairs summing to zero (otherwise the strong classifier 
can be inconclusive). This case is partially excluded by virtue 
of Condition [3] which tells us that V as a whole is always 
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covered by a correct majority vote. Formally, 



P 



X Cj(x)+Cj>(x)=0 



(57) 



= J> f) (c J (x) + c ] ,(x))=0 
= 1-P [3{j,f) G J : Cj(x) + c r {x) > 0] 

where in the last equality we invoked the calculation leading 
from Eq. ( |46c| i to Eq. ( |47| i, which only required Condi- 
tion [3] Alternatively, we could use Condition 3a to prove 



that P 



E 



(x) + Cji(x)+C a {x) = 



0. There is 



another way for the classifier to return an inconclusive result: if 
one weak classifier pair has a correctness sum of 2 and another 
weak classifier pair has a correctness sum of —2. This case 



is included in the bound in Eq. 56b because one of the weak 
classifier pairs in this scenario has a negative correctness sum. 
We can thus conclude that the strict inequality in Eq. ( |56a[ ) 
can be replaced by <. 

Now, the probability of there being one weak classifier pair 
such as in Eq. (|56b]) cannot be greater than the probability of 
at least one of the pairs having a negative correctness sum, 
which in turn — by the union bound — cannot be greater than 
the sum of such probabilities: 



Eq. (f56b| < P 



< 



U (c ] {x) + c r {x) = -2) 
P[cj(x) + Cj,(x) = -2] 



E 



(58) 



where the last equality follows from Lemma la This proves 
Eq. g5J. ■ 

It is interesting to note that — as alluded to in this proof — if 
we were to drop Conditions [3] and |3a] then Eq. ( |55) ) would be- 
come P [J2(jj>)ej c j( g ) + c 3'(%) > 0j > 1- H(j,r)ej e n' 
(note the change from > to >), so that Theorem [3] would 
change to a statement about inconclusive e-perfect strong 
classifiers, which can — with finite probability — yield a "don't- 
know" answer. This may be a useful tradeoff if it turns out 
to be difficult to construct a set of weak classifiers satisfying 
Condition [3] or [3aJ 

D. An alternate weight optimization problem 

The conditions and results established in the previous sub- 
section for correctness of the strong classifier suggest the 
creation of an alternate weight optimization problem to select 
the weak classifiers that will be included in the final majority 



vote, replacing the optimization problem of Section III-D 



The new optimization problem is defined over the space of 
pairs of weak classifiers, rather than singles, which can be 
constructed using elements of the set Ax A, with A as defined 
in Condition [T] We define the ideal pair weight as 



1 (i,j)€jxj 
otherwise 



(59) 



Since we do not know the set J a priori, we shall define 
a QUBO whose solutions Wij G {0,1}, with G Ax A, 

will be an approximation to the ideal pair weights wy. In 
the process, we shall map the pair weight bits Wij to qubits. 
Each Wij determines whether its corresponding pair of weak 
classifiers, hi and hj, will be included in the new strong 
classifier, which can thus be written as: 



QP air (x) = sign 



sign 



(i,j)EAxA 



w l3 (hi(x) + hj(x)) 



(60) 



Recall that we do not know the Wij a priori; they are found 
in our approach via the solution of a QUBO, which we set up 
as follows: 



^opt 



arg mm 



(ij)6-4x-4 



+ X/ JijklWijWkl 

(i,j)^(k,l)eAxA 



(61) 



where the second term is a double sum over all sets of unequal 
pairs. The solution of this QUBO will provide us with an 
approximation to the set J, which yields the desired set of 
weak classifiers as in Eq. ( pT) . Sparsity can be enforced as in 
Eq. ( p2| ) by replacing with otij+X, where A > 0, i.e., by 
including a penalty proportional to ||u?||o- 

The terms a%j and Jiju reward compliance with Conditions 
[2] and [3] respectively. To define ctfy, we first define the 
modified correctness function d i : T ^ H> {0, 1}, where T is 
the training set ( fT3] >: 



1) 



hi{S s ) = y s 
K(x s ) ^ y s 



(62) 



Below we write c^(s) in place of c^(a? s ,y s ) for notational 
simplicity. The term a,^ rewards the pairing of weak classifiers 
which classify the minimal number of vectors x incorrectly at 
the same time, as specified by Condition [2] Each pair included 
gains negative weight for the training set vectors its members 
classify correctly, but is also given a positive penalty for any 
vectors classified incorrectly by both weak classifiers at once: 



a -«*(*)) (i-4(«))] 



s=l 



(63) 

The term Jiju penalizes the inclusion of pairs that are 
too similar to each other, as codified in Condition [3] This is 
accomplished by assigning a positive weight for each vector 
that is classified correctly by two pairs at once: 
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'Jijkl — 
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(64) 



We now have a QUBO for the alternate weight optimization 
problem. This can be translated to the Ising Hamiltonian as 



with the original optimization problem in Section III-E We 
again map from our QUBO variables Wij to variables qtj — 
2(wij — 1/2), yielding the following optimization function: 



^air = argmin 



ijQij 



(i,j)£AxA 



A E J ' 



ijklQijlkl 



(i,j)^(k,l)eAxA 



(65) 



where 



Jk,l)eAxA;(k,l)^{i,j) 



Jijkl + Jklij ■ (66) 



Constant terms were omitted because they have no bearing on 
the minimization. This optimization function is now suitable 
for direct translation to the final Hamiltonian for an AQC: 



9 E Pij Z *K 



(i,j)eAxA 

1 



E 



(67) 



JijklZijZkl ■ 



(i,j)^(k,l)eAxA 



The qubits now represent weights on pairs rather than on 
an individual classifier. is therefore the Pauli <j z operator 
on the qubit assigned to the pair E Ax A. Using \A\ 2 

qubits, this approach will give the optimal combination of 
weak classifier pairs over the training set according to the 
conditions set forth previously. 



V. Using strong classifiers in quantum-parallel 

Now let us suppose that we have already trained our strong 
classifier and found the optimal weight vector w opt or u^ir 
For simplicity we shall henceforth limit our discussion to 
w° pt . We can use the trained classifier to classify new input- 
output pairs x ^ T to decide whether they are correct or 
erroneous. In this section we shall address the question of 
how we can further obtain a quantum speedup in exhaustively 
testing all exponentially many (2 Ni " +Nout ) input-output pairs 
x. The key observation in this regard is that if we can formulate 
software error testing as a minimization problem over the 
space V of all input-output pairs x, then an AQC algorithm 
will indeed perform a quantum-parallel search over this entire 
space, returning as the ground state an erroneous state. 



A. Using two strong binary classifiers to detect errors 

Recall that we are concerned with the detection of vectors 
x 6 V that are erroneous and implemented [Eq. fTl]. To 
accomplish this, we use two strong classifiers. The specifi- 
cation classifier is the binary classifier developed in Section 
III Ideally, it behaves as follows: 




(68) 



The second classifier, which we will call the implementation 
classifier, determines whether or not an input-output vector 
is in the program as implemented. It is constructed in the 
same way as Qw(%), but with its own appropriate training set. 
Ideally, it behaves as follows: 




(69) 



The four possible combinations represented by Eqs. < f58j > and 
( |69| ) correspond to the four cases covered by Definition^T]|4] 
The worrisome input-output vectors, those that are erroneous 
and implemented, cause both classifiers to evaluate to —1. 

B. Formal criterion 

As a first step, suppose we use the optimal weights vector 
in the original strong specification classifier. We then have, 
from ( [13) , 



Q opt {x) = sign[iW(£)] = sign 



' N 

Eopt 
W 

.i=l 



hi(x) 



(70) 



This, of course, is imprecise since our adiabatic algorithm 
solves a relaxed optimization problem (i.e., returns w opt , not 
?Zf' opt ), but we shall assume that the replacement is sufficiently 
close to the true optimum for our purposes. With this caveat, 
Eq. ( |70] > is the optimal strong specification classifier for a given 
input-output vector x, with the classification of x as erroneous 
if Q opt {x) = -1 or as correct if Q opt (x) = +1. 

The strong implementation classifier is constructed similarly 
to the specification classifier: 



r opt (^) = sign [U^(x)} = sign 



' N 

E 



opt 



hi{x) 



(71) 



Here, hi are the same weak classifiers as those used to train the 
specification classifier, but T opt is constructed independently 
from a training set T' which may or may not overlap with 
T ■ This training set is labeled according to the possibility or 
impossibility of producing the input-output pairs in T' from 
the implemented program. The result of this optimization is 
the weight vector z° pt . 

Given the results of the classifiers C] opt (f) and T opt (x) 
for any vector x, the V&V task of identifying whether or 
not x € (S n — ■<§) reduces to the following. Any vector x is 
flagged as erroneous and implemented if Q° pt (x) +T opt (x) = 
—2. We stress once more that, due to our use of the relaxed 
optimization to solve for w opt and z° pt , a flagged x may in 
fact be neither erroneous nor implemented, i.e., our procedure 
is susceptible to both false positives and false negatives. 
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C. Relaxed criterion 



D. Adiabatic implementation of the relaxed criterion 



As was the case with Eq. ( |18[ ), Q opt 7^ opt is unfortunately 
not directly implementable in AQC, but a simple relaxation is. 
The trick is again to remove the sign function, this time from 
( |70| i and {71) , and consider the sum of the two classifiers' 
majority vote functions directly as an energy function: 



C opt (f) = fl^pt(x) + Uz« P t(x) 



(72) 



The combination of the two classifiers gives different results 
for vectors falling under each of the Definitions from Section 
ITFB21 

Case 1: x £ S and x E S 
The vector x is an error implemented in the program and 
manifests a software error. These vectors gain negative weight 
from both classifiers R^o V t and fTg&pt. Vectors falling under 
this definition should receive the lowest values of C opt , if any 
such vectors exist. 

Case 2: x G S and x € S 
The vector x satisfies the don't-worry condition, that is, it is 
a correct input-output string, part of the ideal program P. In 
this case, R^apt > and Ugopt < 0. In the programs quantum 
V&V is likely to be used for, with very infrequent, elusive 
errors, the specification and implementation will be similar 
and the negative weight of Ugo P t < should be moderated 
enough by the positive influence of R^opt > that don't- 
worry vectors should not populate the lowest-lying states. 

Case 3: x G S and x S 
The input portion of the vector x is a don't-care condition. It 
does not violate any program specifications, but is not impor- 
tant enough to be specifically addressed in the implementation. 
This vector will gain positive weight from both R^apt and 
Ugopt and should therefore never be misidentified as an error. 

Case 4: x £ S and x £ S 
The vectors x in this category would be seen as erroneous 
by the program specification - if they ever occurred. Because 
they fall outside the program implementation S, they are not 
the errors we are trying to find. This case is similar to the 
don't-worry situation in that the two strong classifiers will have 
opposite signs, in this case i?^o P t < and Ugapt > 0. By the 
same argument as Definition 2, Definition 4 vectors should not 
receive more negative values of C opt than the targeted errors. 

Having examined the values of C opt (a;) for the relevant 
categories of x, we can formulate error detection as the 
following minimization problem: 



In order to implement the error identification strategy ( 73 1 
we need to consider 



N 



opt 



.opt 



i=l 



)hi(x) 



(74) 



as an energy function. We then consider C° pt (x) as the final 
Hamiltonian Hp for an AQC, with Hilbert space spanned 
by the basis The AQC will then find the state which 

minimizes C° pt (x) out of all 2 Nin+Naut basis states and thus 
identify an error candidate. Because the AQC always returns 
some error candidate, our procedure never generates false 
negatives. However, Cases 2 and 4 would correspond to false 
positives, if an input-output vector satisfying either one of 
these cases is found as the AQC output. 

We can rely on the fact that the AQC actually returns a 
(close approximation to the) Boltzmann distribution 



Pr[f] 



Z 



exp[ 



-C opt ($)/(k B T)] 



(75) 



where ks is the Boltzmann constant, T is the temperature, 
and Z = J2n exp[-C opt (f)/(fc B T)] is the partition function. 
For sufficiently low temperature this probability distribution 
is sharply peaked around the ground state, with contributions 
from the first few excited states. Thus we can expect that even 
if there is a low-lying state that has been pushed there by only 
one of the two binary classifiers Q opt or T opt , the AQC will 
return a nearby state which is both erroneous and implemented 
some of the time and an error will still be detected. Even if 
the undesirable state [x G S and x G S, or x ^ S and x ^ S] 
is the ground state, and hence all erroneous states [x ^ S 
and x € S] are excited states, their lowest energy member 
will be found with a probability that is e -A (* F '/( fc£iT ) smaller 
than the unlooked-for state, where A(tp) is the energy gap to 
the first excited state at the end of the computation. Provided 
ksT and A(tp) are of the same order, this probability will 
be appreciable. 

To ensure that errors which are members of the training set 
are never identified as ground states we construct the training 
set T so that it only includes correct states, i.e., y s = +1 Vs. 
This has the potential drawback that the classifier never trains 
directly on errors. It is in principle possible to include errors in 
the training set (y s = — 1) by adding another penalty term to 
the strong classifier which directly penalizes such training set 
members, but whether this can be done without introducing 
many-body interactions in Hp is a problem that is beyond the 
scope of this work. 



x e = argminC opt (f). (73) 

X 

Suppose the algorithm returns a solution x e (e for "error"). 
We then need to test that it is indeed an error, which amounts 
to checking that it behaves incorrectly when considered as an 
input-output pair in the program implementation P. Note that 
testing that R^o P t(x e ) < is insufficient, since our procedure 
involved a sequence of relaxations. 



E. Choosing the weak classifiers 

Written in the form 2^2iLi( w i Pt + z< i Vt )hi{x), the energy 
function C opt (af) is too general, since we haven't yet specified 
the weak classifiers hi(x). However, we are free to choose 
these so as to mold C opt (a;) into a Hamiltonian that is 
physically implementable in AQC. 

Suppose, e.g., that hi(x) measures a Boolean relationship 
defined by a function fi : {0, l} e n- {0, 1} between several 
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TABLE I: All 16 Boolean functions fi of two binary variables, and their implementation form in terms of the Pauli matrices 
Zy, acting on single qubits or pairs of qubits j E {1,2,3}. The subscript a in the Implementation Form column denotes an 
ancilla qubit, tied to qubits i\ and 12 via x a = x^x^, used to reduce all qubit interactions to at most two-body. 



bits of the input-output vector; Xk = bitfc(af), the fcth bit of 
x G V. For example, 

hi{x) = (x is == /{(xi^XiJ), (76) 

where "a == 6" evaluates to 1 if a = b or to if a 7^ b. Here 
ii and 12 are the positions of two bits from the input vector x ln 
and i 3 is the position of a bit from the output vector x out , so 
that hi measures a correlation between inputs and outputs. 
The choice of this particular form for the weak classifiers 
is physically motivated, as it corresponds to at most three- 
body interactions between qubits, which can all be reduced 
to two-body interaction by the addition of ancilla qubits 
(see below). Let us enumerate these weak classifiers. The 
number of different Boolean functions fi is 2 2 * [49 Much 
more efficient representations are possible under reasonable 
assumptions (50), but for the time being we shall not concern 
ourselves with these. In the example of the classifier ([76]) there 
are N- ln (Ni n — 1) input bit combinations for each of the N out 
output bits. The number of different Boolean functions in this 
example, where £ = 2, is 2 2 = 16. Thus the dimension of 
the "dictionary" of weak classifiers is 

N = l6N in (N in - l)AUt (77) 

for the case of Eq. ( |76] l. 

We wish to find a two-local quantum implementation for 
each hi(x) in the dictionary. It is possible to find a two- 
local implementation for any three-local Hamiltonian using 
so-called "perturbation gadgets", or three ancilla bits for each 
three-local term included [51], but rather than using the general 
method we rely on a special case which will allow us to 

4 Any Boolean function of I variables can be uniquely expanded in the 
form fi(xi, ...,xi) = J2 a =o e ia s a, where n ia e {0, 1} and s a are the 
2^ "simple" Boolean functions so = X1X2 • • • %e, si = zi%2 • • • XI, ■ ■ ■ , 
s 2 £_i = xiX2- ■ -xj, where x denotes the negation of the bit x. Since each 

€i a can assume one of two values, there are 2 2 different Boolean functions. 



use only one ancilla bit per three-local term. We first devise 
an intermediate form function using products of the same 
bits Xi € {0, 1} used to define the logical behavior of 
each weak classifier. This function will have a value of 1 
when the Boolean relationship specified for hi(x) is true, 
and —1 otherwise. For example, consider function number 
8, Xi 3 == Xi 1 A Xi 2 , the AND function. Its intermediate 
form is Ax^x^x^ — 2(xi 3 + Xi x Xi 2 ) + 1. For the bit val- 
ues (xi 17 Xi 2 ,Xi 3 ) = (0,0,0), the value of the intermediate 
function is 1, and the Boolean form is true: AND yields 
0. If instead we had the bit values (x ii: Xi 2 ,Xi 3 ) = (0,0,1), 
the intermediate form would yield — 1, and the Boolean form 
would be false, because the value for Xi 3 does not follow from 
the values for Xi 1 and Xi 2 . 

The two-body implementation form is obtained in two steps 
from the intermediate form. First, an ancilla bit tied to the 
product of the two input bits, x a = x^x.^, is substituted 
into any intermediate form expressions involving three-bit 
products. This is permissible because such an ancilla can 
indeed be created by introducing a penalty into the final 
Hamiltonian for any states in which the ancilla bit is not equal 
to the product x^x^. We detail this method below. Then, the 
modified intermediate expression is translated into a form that 
uses bits valued as x\ 6 {—1, 1} rather than Xi £ {0, 1} using 
the equivalence Xi = 2x[ — 1. The modified intermediate form 
is now amenable to using the implemented qubits. Note that 
the Pauli matrix acts on a basis ket \x) as 

Z l \x) = {-l) hit ^\x). (78) 

This means that we can substitute Zi for x\ and Zi <g> Zj for 
x^x'j in the intermediate form, resulting in the implementation 
form given in Column 4 of Table [I] Some weak classifiers do 
not involve three-bit interactions. Their implementation forms 
were devised directly, a simple process when there is no need 
for inclusion of an ancilla. 



15 



We have reduced the dictionary functions from three-bit 
to two-bit interactions by adding an ancilla bit to represent 
the product of the two input bits involved in the function. 
Therefore, the maximum number of qubits needed to imple- 
ment this set of weak classifiers on a quantum processor 
is Q = Nj n + N out + . In practice, it is likely to be 
significantly less because not every three-bit correlation will 
be relevant to a given classification problem. 

Let us now discuss how the penalty function is introduced. 
For example, consider again the implementation of weak clas- 
sifier function i = 8, whose intermediate form involves three- 
qubit products, which we reduced to two-qubit interactions by 
including x a . 

We ensure that x a does indeed represent the product it is 
intended to by making the function a sum of two terms: the 
product of the ancilla qubit and the remaining qubit from the 
original product, and a term that adds a penalty if the ancilla is 
not in fact equal to the product of the two qubits it is meant to 
represent, in this case /penalty = x il x ia -2(x il + x i2 )x a + 3x a . 
In the case where (x n , x i2 , x a ) = (1, 0, 0), / pena i ty = 0, but in 
a case where x a does not represent the intended product such 
as (xiifXi^Xa) = (1,0,1), /penalty = 1. In fact, the penalty 
function behaves as follows: 



penalty 



o 



positive otherwise 



(79) 



In the end, we have the modified intermediate form / 8 = 
4x a Xi 3 — 2 (xi 3 + x a ) + 1 + /penalty, which involves only two- 
qubit interactions. This would be implemented on the quantum 
computer as the sum of two Hamiltonian terms: 



H s = Z a ® Z lzl 
from the implementation column of Table [I] and 



(80) 



-ffpenalty(«l, ^) = \ Z H ® Z i 2 ~ \ Z ^ ® Z a ~ \ Z ^ ® Z a 
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1 



rZi, -Zj. 



(81) 



4 11 4 ' 2 1 2 4 

the implementation form of /penalty > so a Hamiltonian to 
find input-output vectors classified negatively by this weak 
classifier would be 



H 



weak 



H R + H, 



penalty 



(82) 



When the strong classifier is implemented as a whole, 
multiple weak classifiers with weight 1 may use the same two 
input bits, and therefore share an ancilla bit that is the product 
of those input bits. When this is the case, it is sufficient to add 
the penalty function to the final Hamiltonian once, though the 
ancilla is used multiple times. 

The inclusion of ancilla qubits tied to products of other 
qubits and their associated penalties need not interfere with 
the solution of the V&V problem, although the ancilla penalty 
terms must appear in the same final Hamiltonian as this 
optimization. If the ancilla penalty terms are made reasonably 
large, they will put any states in which the ancillas do not 
represent their intended products (states which are in fact 
outside of V) far above the levels at which errors are found. For 
instance, consider an efficient, nearly optimal strong classifier 



closely approximating the conditions set forth in Section [TV] 
Such a classifier makes its decision on the strength of two 
simultaneously true votes. If two such classifiers are added 
together, as in the verification problem, the lowest energy 
levels will have an energy near —4. If the penalty on a 
forbidden ancilla state is more than a reasonable 4 units, such 
a state should be well clear of the region where errors are 
found. 

This varied yet correlation-limited set of weak classifiers fits 
nicely with the idea of tracking intermediate spaces [Eq. Qll)], 
where we can use an intermediate space lj to construct a set of 
weak classifiers feeding into the next intermediate space X,+i- 
This is further related to an obvious objection to the above 
classifiers, which is that they ignore any correlations involving 
four or more bits, without one-, two-, or three-bit correlations. 
By building a hierarchy of weak classifiers, for intermediate 
spaces, such correlations can hopefully be accounted for as 
they build up by keeping track instead of one-, two-, and three- 
bit terms as the program runs. 

F. QUBO-AQC quantum parallel testing 

With the choice of Boolean functions for the weak clas- 
sifiers, the quantum implementation of the energy function 
C opt (x) [Eq. (74])] becomes 



JV 

txtest \ ^ 



/] ^penalty (J, k), 



(83) 



where Hi denotes the implemented form given in the third 
column of Table [I] and the indices j, k € {1, . . . , iV; n } denote 
all possible pairings of input qubits tied to ancillas. The 
ground state of Hp st , which corresponds to the optimal weight 
sets w° pt and z° pt derived from the set of weak classifiers 
detailed in Subsection |V-E| is an erroneous state, which, by 
construction, is not a member of the training set T. 

How do we construct the AQC such that all input-output 
pairs x are tested in parallel? This is a consequence of the 
adiabatic interpolation Hamiltonian ( |26] i, and in particular the 
initial Hamiltonian Hi of the type given in Eq. ( pTj i. The 
ground state of this positive semi-definite Hj is an equal super- 
position over all input-output vectors, i.e., Hj YlxeV \%) = 0> 
and hence when we implement the AQC every possible x starts 
out as a candidate for the ground state. The final (Boltzmann) 
distribution of observed states strongly favors the manifold of 
low energy states, and by design these will be implemented 
erroneous states, it they exist. 

VI. Sample problem implementation 

In order to explore the practicality of our two-step adiabatic 
quantum approach to finding software errors, we have applied 
the algorithm to a program of limited size containing a 
logical error. We did this by calculating the results of the 
algorithm assuming perfect adiabatic quantum optimization 
steps on a processor with few (N < 30) available qubits. 
Preliminary characterizations of the accuracy achievable using 
such an algorithm given a set of weak classifiers with certain 
characteristics are also presented. 
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A. The Triplex Monitor Miscompare problem 

The problem we chose to implement is a toy model of 
program design practices used in mission critical software 
systemsj^] This program monitors a set of three redundant 
variables {A t , B t ,Ct} for internal consistency. The variables 
could represent, e.g., sensor inputs, control signals, or par- 
ticularly important internal program values. If one value is 
different from the other two over a predetermined number of 
snapshots in time t, a problem in the system is indicated and 
the value of the two consistent redundant variables is propa- 
gated as correct. Thus the program is supposed to implement 
a simple majority-vote error-detection code. 

We consider only the simplest case of two time snapshots, 
i.e., t — 1,2. As just explained, a correct implementation 
of the monitoring routine should fail a redundant variable 
A, B, or C if that same variable miscompares with both 
of the other variables in each of the two time frames. The 
erroneous implemented program we shall consider has the 
logical error that, due to a mishandled internal implementation 
of the miscompare tracking over multiple time frames, it fails 
a redundant variable any time there has been a miscompare in 
both time frames, even if the miscompare implicated a different 
variable in each time frame. 

In order to facilitate quantum V&V using the smallest 
possible number of qubits, we assume the use of classical 
preprocessing to reduce the program to its essential structure. 
The quantum algorithm does not look at the values of the 
three redundant variables in each time frame. Instead, it 
sees three logical bits per snapshot, telling it whether each 
pair of variables is equal. This strategy is also reflected in 
the program outputs, which are three logical bits indicating 
whether or not each redundant variable is deemed correct by 
the monitoring routine. Thus there are nine logical bits, as 
specified in Table [II] 



Bit 


Significance 


Xl 


Ai±B x 


X 2 


Bi t^Ci 


£3 




X4 


A 2 + B 2 


£5 




XQ 


A 2 + C 2 


x 7 


A failed 


X8 


B failed 


Xg 


C failed 



TABLE II: Logical bits and their significance in terms of 
variable comparison in the Triplex Miscompare problem. 

In terms of Boolean logic, the two behaviors are as follows: 
Program Specification 

x 7 — xi A £3 A X4 A Xq, (84a) 
x s = xi A X2 A x 4 A X5, (84b) 
x 9 — x 2 A x 3 A £5 A Xe, (84c) 

5 We are grateful to Greg Tallant from the Lockheed Martin Corporation 
for providing us with this problem as an example of interest in flight control 
systems. 



i.e., a variable is flagged as incorrect if and only if it has 
miscompared with all other variables in all time frames. 
Erroneous Program Implementation 

x 7 = ((x 1 A x 2 ) V (x 2 A X3) V (xi A £ 3 )) A14A x 6 , (85a) 

x 8 = {(xi A x 2 ) V (x 2 A x 3 ) V (xi A x 3 )) A a; 4 A x 5 , (85b) 

x 9 = ((xi A x 2 ) V (x 2 A £3) V (xi A X3)) A X5 A xq, (85c) 

i.e., a variable is flagged as incorrect if it miscompares with 
the other variables in the final time frame and if any variable 
has miscompared with the others in the previous time frame. 

B. Implemented algorithm 

The challenges before us are to train classifiers to recog- 
nize the behavior of both the program specification and the 
erroneous implementation, and then to use those classifiers to 
find the errors. These objectives have been programmed into 
a hybrid quantum-classical algorithm using the quantum tech- 
niques described in Sections [Hi] and [V] and classical strategy 
refinements based on characteristics of available resources (for 
example, the accuracy of the set of available weak classifiers). 
The performance of this algorithm has been tested through 
computational studies using a classical optimization routine in 
place of adiabatic quantum optimization calls. 

The algorithm takes as its inputs two training sets, one 
for the specification classifier and one for the implementation 
classifier. The two strong classifiers are constructed using the 
same method, one after the other, consulting the appropriate 
training set. 

When constructing a strong classifier, the algorithm first 
evaluates the performance of each weak classifier in the 
dictionary over the training set. Weak classifiers with poor per- 
formance, typically those with over 40% error, are discarded. 
The resulting, more accurate dictionary is fed piecewise into 
the quantum optimization algorithm. 

Ideally, the adiabatic quantum optimization using the final 
Hamiltonian ( p5| would take place over the set of all weak 
classifiers in the modified, more accurate dictionary. However, 
the reality of quantum computation for some time to come 
is that the number of qubits available for processing will be 
smaller than the number of weak classifiers in the accurate 
dictionary. This problem is addressed by selecting random 
groups of Q classifiers (the number of available qubits) to be 
optimized together. An initial random group of Q classifiers 
is selected, the optimal weight vector q° pt is calculated by 
classically finding the ground state of Hp, and the weak 
classifiers which receive weight are discarded. The resulting 
spaces are filled in with weak classifiers randomly selected 
from the set of those which have not yet been considered, until 
all Q classifiers included in the optimization return a weight 
of 1. This procedure is repeated until all weak classifiers in 
the accurate dictionary have been considered, at which time 
the most accurate group of Q generated in this manner is 
accepted as the strong classifier for the training set in question. 
Clearly, alternative strategies for combining subsets of Q weak 
classifiers could be considered, such as genetic algorithms, but 
this was not attempted here. 
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Fig. 6: Error fractions in 16-member specification classifier calculations; Left: average over 50. Right: best of 50. 
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Fig. 7: Error fractions in 16-member implementation classifier calculations; Left: average over 50. Right: best of 50. 



Both the specification and implementation strong classifiers 
are generated in this way, resulting in 



N 



N 



(86) 
(87) 



where and z)° take the value 1 if the corresponding 
weak classifier hi(x) is selected using the iterative procedure 
described in the preceding paragraph, and are zero otherwise. 
This is the same structure as that seen in Eqs. ( |70"| i and ( [71} , 
but with different vectors w and z due to the lack of available 
qubits to perform a global optimization over the accurate 
dictionary. 



The two strong classifiers of Eqs. (86 1 and (87i are summed 



as in Eq. ( 72 1 to create a final energy function that will push 



errors to the bottom part of the spectrum. This is translated 
to a final Hamiltonian Hp as in Eq. (83 1 and the result 
of the optimization (i.e., the ground state of this Hp) is 
returned as the error candidate. This portion of the algorithm 
makes it crucial to employ intelligent classical preprocessing 
in order to keep the length of the input and output vectors as 



small as possible, because each bit in the input-output vector 
corresponds to a qubit, and the classical cost of finding the 
ground state of Hp grows exponentially with the number of 
qubits. 

C. Simulation results 

Our simulation efforts have focused on achieving better ac- 
curacy from the two strong classifiers. If the strong classifiers 
are not highly accurate, the second part of the algorithm, the 
quantum-parallel use of the classifiers, will not produce useful 
results. 

In the interest of pushing the limits of accuracy of the 
strong classifiers, some simulations were performed on the 
miscompare problem in a single time frame. Under this 
simplification, the program specification and implementation 
are identical (the error arises over multiple time frames), and 
indeed the numerical results will show that the results for the 
two classifiers are the same (see Figs. [6] and |7J right). 

The algorithm described in Subsection |VI-B| was run 50 
times, each time producing two strong classifiers comprising 
16 or fewer weak classifier members. The figure of 16 qubits 
was chosen because it allowed the computations to be per- 
formed in a reasonable amount of time on a desktop computer 
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Fig. 8: Error fractions of specification (left) and implementation (right) classifiers, for an increasing number of qubits. 



while still allowing for some complexity in the makeup of 
the strong classifiers. This set of 50 complete algorithmic 
iterations was performed for 26 values of A, the sparsity 



parameter introduced in Eq. (18 1. The average percentage of 
error for both strong classifiers was examined, as was the 
best error fraction achieved in the 50 iterations. These two 
quantities are defined as follows: 



, 50 



i=i 



err n 



= min L t (w opt ) 



(88) 
(89) 



where L is the function that counts the total number of 
incorrect classifications, Eq. (17) . The weight vector z° pt can 
be substituted for w opt in Eqs. ((88) and ([89) if the strong 
classifier being analyzed is the implementation rather than the 
specification classifier. 

Both the average and minimum error for the specification 
and implementation classifiers are plotted in Figs. [6] and [7] 
respectively, as a function of A. 

As shown in Figs. [6] and [7] while the average percent error 
for both classifiers hovered around 25%, the best percent error 
was consistently just below 16% for both the specification 
and implementation classifiers. The consistency suggests two 
things: that the randomness of the algorithm can be tamed 
by looking for the best outcome over a limited number of 
iterations, and that the sparsity parameter, A, did not have 
much effect on classifier accuracy. 

Noting in particular the lack of dependency on A, we move 
forward to examine the results of simulations on more difficult 
and computationally intensive applications of the algorithm. 
These results address the triplex monitor miscompare problem 



exactly as described in subsection VI-A and increase the 



number of qubits as far as 26. The error fractions of the best 
strong classifiers found, defined as 



err min (Q) = imnL t {w° pt ) i G {1, 



i(Q)} (90) 



where n S i m (Q) is the number of simulations performed at Q 
qubits, are plotted in Fig. [8] as a function of the number of 
qubits allowed in the simulation. 



For Q — 16 through Q — 23, the error fraction shown is 
for the best-performing classifier, selected from 26 iterations 
of the algorithm that were calculated using different values 
of A. The consistently observed lack of dependence on A in 
these and other simulations (such as the 50-iteration result 
presented above) justifies this choice. For Q = 24 to Q = 
26, it was too computationally intensive to run the algorithm 
multiple times, even on a high performance computing cluster, 
so the values plotted are from a single iteration with A assigned 
to zero. This was still deemed to be useful data given the 
uniformity of the rest of the simulation results with respect 
to A. The dependence on the parity of the number of qubits 
is a result of the potential for the strong classifier to return 
when the number of weak classifiers in the majority vote 
is even. Zero is not technically a misclassification in that the 
classifier places the vector x in the wrong class, but neither 
does the classifier give the correct class for x. Rather, we 
obtain a "don't-know"answer from the classifier, which we 
do not group wtih the misclassifications because it is not an 
outright error in classification. It is a different, less conclusive 
piece of information about the proper classification of x which 
may in fact be useful for other applications of such classifiers. 



The important conclusion to be drawn from the data quan- 
tifying strong classifier errors as a function of the number of 
available qubits is that performance seems to be improving 
only slightly as the number of available qubits increases. This 
may indicate that even with only 16 qubits, if the algorithm 
is iterated a sufficient number of times to compensate for its 
random nature, the accuracy achieved is close to the limit of 
what can be done with the current set of weak classifiers. This 
is encouraging in the context of strong classifier generation 
and sets a challenge for improving the performance of weak 
classifiers or breaking the problem into intermediate stages. 
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Fig. 9: Accuracy of weak classifier dictionary on input-output vector space. White/black pixels represent a weak classifier hi(x) 
(all weak classifiers meeting Condition [T] indexed in order of increasing error r]j as in Eq. ( |9T| on vertical axis) categorizing an 
input-output vector (indexed in lexicographical order on horizontal axis, there are 2 9 vectors arising from the 9 Boolean variables 
in the sample problem) correctly/incorrectly, respectively. These classifications were to determine whether an input-output pair 
was correct or erroneous, i.e., we are analyzing the performance of the specification classifier. 



D. Comparison of results with theory 

In light of the conditions for an ideal strong classifier 



developed in Section IV it is reasonable to ask the following 
questions: How close do the weak classifiers we have for 
the problem studied here come to satisfying the conditions? 
What sort of accuracy can we expect our simulations to yield? 
Fig. [9] and a few related calculations shed some light on the 
answers. In the figure, each row of pixels represents a single 
weak classifier in the dictionary and each column represents 
one vector in the input-output space. Horizontal red lines 
divide the different levels of performance exhibited by the 
weak classifiers. White pixels represent a given weak classifier 
categorizing a given input-output vector correctly. Black pixels 
represent incorrect classifications. 

The problematic aspect of Fig.[9]is the vertical bars of white 
and black exhibited by some of the more accurate classifiers. 
The method detailed above for constructing a completely 
accurate strong classifier relies on pairs of classifiers which are 
correct where others fall short, and which do not both classify 
the same input-output vector incorrectly. This is impossible to 
find in the most accurate group of weak classifiers alone, given 
that there are black bars of erroneous classifications spanning 
the entire height of the set. 

For numerical analysis of the performance of the set of 
Boolean weak classifiers on the sample problem, we relate the 
statistics of the dictionary on the input-output vector space V 
to Conditions [2] and 2a Three quantities will be useful for this 
analysis. The first is the error fraction of an individual weak 
classifier 



>)j 



1 



1 5 



s 



(91) 



that is, the fraction of the training set incorrectly classified by 
the weak classifier hj(x). We use the Heaviside step function 
to count the number of vectors correctly classified. 

Next is the minimum possible overlap of correctly classified 
vectors for a pair of weak classifiers over V: 

4>jj> = 1 - m - Vj> (92) 

In Eq. ( |9"2| ), we add the correctness fraction (1 — r)j) of 
each weak classifier, then subtract 1 to arrive at the number 
of vectors that must be classified correctly by both weak 
classifiers at once. 

The next definition we shall require is that of the actual 
overlap of correct classifications: 



Hi' 



1 S 

, = - [y s (hj(x s ) + h f (x s ))} = <j) jr + 



■33 



(93) 



In Eq. ( |93| ), we count the number of vectors that are actually 
classified correctly by both weak classifiers. 

If the minimum possible and actual overlaps are the same, 
i.e., Cjji = 0, then Condition [2] holds, and the weak classifier 
pair has minimum correctness overlap. Otherwise, if (f)jf 7^ 
Jjf, only the weaker Condition [2a] is satisfied, so the weak 
classifier pair has a greater than minimal correctness overlap 
and a forced overlap of incorrect classifications Ejj' > (see 
Fig. [2]) that could cancel out the correct votes of a different 
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weak classifier pair and cause the strong classifier to be either 
incorrect or inconclusive. 

Our numerical analysis of the weak classifiers satisfying 
Condition [T] (having rjj < 0.5) showed that the average 
correctness overlap jjji between any two weak classifiers 
was 0.3194. The maximum correctness overlap for any pair 
of weak classifiers was jjji = 0.6094. The minimum was 
jjji — 0.1563, between two weak classifiers with respective 
error fractions (amount of the training set misclassified by each 
individual weak classifier) of rjj — 0.4844 and r]ji = 0.4531. 
Compare this to the minimum possible overlap with two such 
classifiers, </>,■,•/ = 0.0625, and it becomes apparent that this set 
of weak classifiers falls short of ideal, given that 6jj> = 0.0938 
for the weak classifier pair with minimum overlap. 

When only the most accurate weak classifiers (rjj ~ 0.3906; 
above the top red horizontal line in Fig. |9]l were included, the 
average correctness overlap was jjji = 0.4389, the maximum 
was jjji — 0.6094, and the minimum was jjji = 0.3594. In 
order to come up with a generous estimate for the accuracy 
achievable with this group of weak classifiers, we focus on the 
minimum observed correctness overlap. The minimum possi- 
ble correctness overlap for two classifiers with i]j = 0.3906 
is <j)jji — 0.2188. With an ideal set of weak classifiers of 
error rjj = 0.3906 and correctness overlap <pjji = 0.2188, it 
would take seven weak classifiers to construct a completely 
accurate strong classifier: three pairs of two classifiers each to 
cover a fraction 0.6564 of the solution space with a correctness 
overlap from one of the pairs, and one more weak classifier to 
provide the extra correct vote on the remaining 0.3436 fraction 
of the space. Assuming that three pairs of weak classifiers with 
minimum overlap and optimal relationships to the other weak 
classifier pairs could be found, there will still be a significant 
error due to the overlap fractions of the pairs being larger than 
ideal. In fact, each pair of weak classifiers yields an error 
contribution of tjji = 0.1406, guaranteeing that a fraction 
2>Cjji = 0.4218 of the input-output vectors will be classified 
incorrectly by the resulting strong classifier. This is not far 
from the simulation results for odd-qubit strong classifiers 
(Fig. [8] left), which suggests that the algorithm currently in use 
is producing near-optimal results for the dictionary of weak 
classifiers it has access to. 

VII. Conclusions 

We have developed a quantum adiabatic machine learning 
approach and applied it to the problem of training a quantum 
software error classifier. We have also shown how to use this 
classifier in quantum-parallel on the space of all possible input- 
output pairs of a given implemented software program P. The 
training procedure involves selecting a set of weak classifiers, 
which are linearly combined, with binary weights, into two 
strong classifiers. 

The first quantum aspect of our approach is an adiabatic 
quantum algorithm which finds the optimal set of binary 
weights as the ground state of a certain Hamiltonian. We 
presented two alternatives for this algorithm. The first, inspired 
by (5], p7) , gives weight to single weak classifiers to find an 
optimal set. The second algorithm for weak classifier selection 



chooses pairs of weak classifiers to form the optimal set and 
is based on a set of sufficient conditions for a completely 
accurate strong classifier that we have developed. 

The second quantum aspect of our approach is an explicit 
procedure for using the optimal strong classifiers in order 
to search the entire space of input-output pairs in quantum- 
parallel for the existence of an error in P. Such an error 
is identified by performing an adiabatic quantum evolution, 
whose manifold of low-energy final states favors erroneous 
states. 

A possible improvement of our approach involves adding 
intermediate training spaces, which track intermediate program 
execution states. This has the potential to fine-tune the weak 
classifiers, and overcome a limitation imposed by the desire 
to restrict our Hamiltonians to low-order interactions, yet still 
account for high-order correlations between bits in the input- 
output states. 

An additional improvement involves finding optimal inter- 
polation paths s(t) p6| from the initial to the final Hamilto- 
nian [52 1, [53 1, for both the classifier training and classifier 
implementation problems. 

We have applied our quantum adiabatic machine learning 
approach to a problem with real-world applications in flight 
control systems, which has facilitated both algorithmic devel- 
opment and characterization of the success of training strong 
classifiers using a set of weak classifiers involving minimal bit 
correlations. 
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